if (!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")}
if (!requireNamespace("edgeR", quietly = TRUE)){
BiocManager::install("edgeR")}
if (!requireNamespace("GEOquery", quietly = TRUE)){
BiocManager::install("GEOquery")}
if (!requireNamespace("ComplexHeatmap", quietly = TRUE)){
BiocManager::install("ComplexHeatmap")}
if (!requireNamespace("limma", quietly = TRUE)){
BiocManager::install("limma")}
if (!requireNamespace("ExpressionSet", quietly = TRUE)){
BiocManager::install("ExpressionSet")}
Warning: package ‘ExpressionSet’ is not available for Bioconductor version '3.16'
A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packagesWarning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/annotation/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/annotation/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/experiment/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/experiment/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/workflows/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/workflows/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/books/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/books/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'
Update all/some/none? [a/s/n]:
n
if (!requireNamespace("circlize", quietly = TRUE)){
BiocManager::install("circlize")}
if (!requireNamespace("gprofiler2", quietly = TRUE))
BiocManager::install("gprofiler2")
Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/annotation/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/annotation/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/experiment/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/experiment/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/workflows/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/workflows/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/books/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/books/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/plotly_4.10.1.zip'
Content type 'application/zip' length 3244302 bytes (3.1 MB)
downloaded 3.1 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/gridExtra_2.3.zip'
Content type 'application/zip' length 1109470 bytes (1.1 MB)
downloaded 1.1 MB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/gprofiler2_0.2.1.zip'
Content type 'application/zip' length 2085460 bytes (2.0 MB)
downloaded 2.0 MB
package ‘plotly’ successfully unpacked and MD5 sums checked
package ‘gridExtra’ successfully unpacked and MD5 sums checked
package ‘gprofiler2’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\metyu\AppData\Local\Temp\Rtmpiw8lqo\downloaded_packages
Update all/some/none? [a/s/n]:
n
if (!requireNamespace("kableExtra", quietly = TRUE))
BiocManager::install("kableExtra")
Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/annotation/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/annotation/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/experiment/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/data/experiment/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/workflows/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/workflows/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/books/bin/windows/contrib/4.2/PACKAGES.rds': HTTP status was '404 Not Found'Warning: downloaded length 0 != reported length 6873Warning: cannot open URL 'https://bioconductor.org/packages/3.16/books/bin/windows/contrib/4.2/PACKAGES.gz': HTTP status was '404 Not Found'trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/systemfonts_1.0.4.zip'
Content type 'application/zip' length 1042738 bytes (1018 KB)
downloaded 1018 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/webshot_0.5.4.zip'
Content type 'application/zip' length 216079 bytes (211 KB)
downloaded 211 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/svglite_2.1.1.zip'
Content type 'application/zip' length 667377 bytes (651 KB)
downloaded 651 KB
trying URL 'https://cran.rstudio.com/bin/windows/contrib/4.2/kableExtra_1.3.4.zip'
Content type 'application/zip' length 1853639 bytes (1.8 MB)
downloaded 1.8 MB
package ‘systemfonts’ successfully unpacked and MD5 sums checked
package ‘webshot’ successfully unpacked and MD5 sums checked
package ‘svglite’ successfully unpacked and MD5 sums checked
package ‘kableExtra’ successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\metyu\AppData\Local\Temp\Rtmpiw8lqo\downloaded_packages
Update all/some/none? [a/s/n]:
n
if (!requireNamespace("plotly", quietly = TRUE))
BiocManager::install("plotly")
if (!requireNamespace("dplyr", quietly = TRUE))
BiocManager::install("dplyr")
library(dplyr)
library(plotly)
The normalized data has already been written into a text file by the previous Assignment 1. For better assessment, I have indicated it on the file uploaded.
normalized_count_data <- read.table(file=file.path(getwd(), "normalized_counts_annotated")
, header = TRUE, stringsAsFactors = FALSE, check.names = FALSE )
knitr::kable(normalized_count_data[1:10,1:7], type="pipe")
| hgnc_symbol | YRE1A | YRE1B | YRE2A | YRG1A | YRG1B | YRG2B |
|---|---|---|---|---|---|---|
| A4GALT | 4.345648 | 3.111477 | 2.408097 | 5.369569 | 6.397003 | 6.796014 |
| AAAS | 19.317681 | 22.327817 | 21.792253 | 18.384756 | 17.419577 | 17.311220 |
| AACS | 32.913059 | 24.534108 | 24.466495 | 23.731644 | 17.621289 | 17.428989 |
| AADACP1 | 2.418554 | 4.507231 | 3.916687 | 13.734990 | 14.270060 | 14.143042 |
| AADAT | 4.058510 | 3.995899 | 3.647402 | 4.637361 | 4.052613 | 3.678086 |
| AAED1 | 14.351471 | 11.913866 | 12.242066 | 12.711562 | 8.425186 | 7.790737 |
| AAGAB | 36.922469 | 35.816951 | 39.338927 | 40.358329 | 31.836730 | 30.961949 |
| AAK1 | 1.531217 | 1.364910 | 1.534162 | 2.165459 | 2.677295 | 2.667404 |
| AAMDC | 46.807715 | 44.246410 | 48.924065 | 66.090858 | 49.702345 | 44.178113 |
| AAMP | 82.175802 | 88.352333 | 93.832788 | 76.203567 | 81.078221 | 82.443384 |
I am defining the elements of the normalized data matrix which I will be using for the heatmap later.
The heatmap uses different colors for the expression data. The heatmap has many genes clustered based on their expression at different levels such as control (EA1,EA2,EB1) or the cells treated with GLI2 (G1A,G2A,G1B).
if(min(heatmap_matrix) == 0){
heatmap_col = circlize::colorRamp2(c( 0, max(heatmap_matrix)),
c("white", "purple"))
} else {
heatmap_col = circlize::colorRamp2(c(min(heatmap_matrix), 0,
max(heatmap_matrix)), c("darkgreen", "white", "purple"))
}
first_heatmap <- ComplexHeatmap::Heatmap(as.matrix(heatmap_matrix),
show_row_dend = TRUE, show_column_dend = TRUE,
col=heatmap_col, show_column_names = TRUE,
show_row_names = FALSE, show_heatmap_legend = TRUE)
first_heatmap
SPP1 is an EMT genes It shows higher expression when treated with GLI2 compare to empty vector control cells (EV). This has been also proved by the authors. However, indicating this again would strengthen the hypothesis corresponding to Dox treated cells having GLI2 vector show higher EMT gene expression as opposed to controls having empty vector showing less expression of the genes.
GLI2_treatment_samples <- grep(colnames(normalized_count_data),pattern = "\\G")
EV_tratment_samples <- grep(colnames(normalized_count_data),pattern = "\\E")
gene_of_interest <- which(normalized_count_data$Gene_symbol == "SPP1")
YAPC_GLI2_samples <- t(normalized_count_data[gene_of_interest,5:7])
colnames(YAPC_GLI2_samples) <- c("GLI2_vector_cells")
YAPC_EV_samples <- t(normalized_count_data[gene_of_interest,2:4])
colnames(YAPC_EV_samples) <- c("Empty_Vector_cells")
YAPC_GLI2_samples
GLI2_vector_cells
YRG1A 321.9614
YRG1B 607.5096
YRG2B 580.2982
YAPC_EV_samples
Empty_Vector_cells
YRE1A 1.307170
YRE1B 1.616595
YRE2A 1.752119
GLI2 treated cells are different in terms of their expression from the empty vector cells. P value (0.03135) < 0.05. Indeed this shows there is significant difference. However, this requires to see if GLI2 treatments show difference among one another so the control cells.
t.test(x=t(YAPC_GLI2_samples), y=t(YAPC_EV_samples))
Welch Two Sample t-test
data: t(YAPC_GLI2_samples) and t(YAPC_EV_samples)
t = 5.5139, df = 2, p-value = 0.03135
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
110.2125 893.1831
sample estimates:
mean of x mean of y
503.256412 1.558628
The MDS plot allowed to visualize the difference within the different cell types Earlier examples of MDS plot showed similar results. I hypothesize that GLI2 expression is higher than control cell’s expression levels due to DOX induction of EMT genes. In general I have observed Gli2 cells to be less clustered, due to higher differential expression among GLI2 cells.
limma::plotMDS(heatmap_matrix,col=c(rep("darkgreen",3), rep("blue",3)))
The model I am building is based upon GLI2 expression status of genes that are showing higher expression with significant indications of the differential expression.
samples <- data.frame(lapply(colnames(normalized_count_data)[2:7],
FUN=function(x){unlist(strsplit(x, split = "[0-9]"))[c(1)]}))
colnames(samples) <- colnames(colnames(normalized_count_data)[2:7])
rownames(samples) <- c("Treatments")
samples <- data.frame(t(samples))
knitr::kable(samples, type = "pipe")
| Treatments |
|---|
| YRE |
| YRE |
| YRE |
| YRG |
| YRG |
| YRG |
YRG_model <- model.matrix( ~ samples$Treatments)
knitr::kable(YRG_model[1:6,], type = "pipe")
| (Intercept) | samples$TreatmentsYRG |
|---|---|
| 1 | 0 |
| 1 | 0 |
| 1 | 0 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
The differential expression data matrix corresponding to gene symbols were merged with the matrix that has the fitted model. The Benjamin-hochberg correction method was applied to adjust the expression data.
expressionMatrix <- as.matrix(normalized_count_data[,2:7])
rownames(expressionMatrix) <- normalized_count_data$Gene_symbol
colnames(expressionMatrix) <- colnames(normalized_count_data)[2:7]
set_expression_matrix <- Biobase::ExpressionSet(assayData = expressionMatrix)
fit <- limma::lmFit(set_expression_matrix, YRG_model)
fit_Bayes <- limma::eBayes(fit, trend = TRUE)
BH_method_fit <- limma::topTable(fit_Bayes,
coef = ncol(YRG_model),
adjust.method = "BH",
number = nrow(expressionMatrix))
merged_hits <- merge(normalized_count_data$Gene_symbol,
BH_method_fit,
by.y=0,by.x=1,
all.y=TRUE)
merged_hits <- merged_hits[order(merged_hits$P.Value),]
colnames(merged_hits) <- c("Gene_symbol","logFC", "AveExpr", "t",
"P.Value","adj.P.Val", "B")
rownames(merged_hits) <- 1:nrow(merged_hits)
knitr::kable(merged_hits[1:10,1:7], type = "pipe", row.names = FALSE,)
| Gene_symbol | logFC | AveExpr | t | P.Value | adj.P.Val | B |
|---|---|---|---|---|---|---|
| S100A2 | 192.67873 | 105.164286 | 63.79569 | 0e+00 | 0.0000245 | 1.820600 |
| SSFA2 | 76.45805 | 75.591839 | 45.17567 | 0e+00 | 0.0000789 | 1.781362 |
| NT5E | 86.27685 | 123.407858 | 42.93449 | 0e+00 | 0.0000789 | 1.773004 |
| S100A13 | 132.80964 | 179.285328 | 37.98876 | 0e+00 | 0.0001191 | 1.749192 |
| TRIB2 | 51.68968 | 38.062060 | 35.69100 | 1e-07 | 0.0001361 | 1.734718 |
| HEG1 | -13.44713 | 8.317468 | -31.24508 | 1e-07 | 0.0002423 | 1.697467 |
| NTN4 | 35.83705 | 33.048568 | 28.42566 | 2e-07 | 0.0003561 | 1.664696 |
| UCP2 | 80.78518 | 94.331418 | 26.66241 | 3e-07 | 0.0004488 | 1.638993 |
| PLLP | -31.42241 | 35.188668 | -25.42904 | 4e-07 | 0.0004683 | 1.617932 |
| UNC5B | 19.59576 | 12.041479 | 24.78963 | 5e-07 | 0.0004683 | 1.605831 |
The p-value was adjusted to have more stringent results. Normally significant p-value is set to 0.05. However, I wanted to be more stringent on my results and indicate the specific EMT genes. Probable genes are 741. However, there are 27 genes that have significant potential to be an EMT gene.
length(which(merged_hits$P.Value < 0.0005))
[1] 741
length(which(merged_hits$adj.P.Val < 0.0005))
[1] 27
This part of this assignment aims to detect SPP1’s role on how GLI2 cells are more differently expressed and do have higher difference than the control cells. It is evident that, with more stringent p-value, the significant portion (741) of about 11500 genes are showing features of EMT genes
model_pvalues <- data.frame(Gene_symbol = merged_hits$Gene_symbol,
pvalue = merged_hits$P.Value)
model_pvalues$color <- "black"
model_pvalues$color[model_pvalues$pvalue < 0.0005] <- "orange"
SPP1 <- normalized_count_data$Gene_symbol[which(normalized_count_data$Gene_symbol == "SPP1")]
model_pvalues$color[model_pvalues$Gene_symbol== SPP1] <- "red"
plot(model_pvalues$pvalue,
col = model_pvalues$color,
xlab = "Number of Genes within the Dataset",
ylab ="P-value",
main="P value distribution of EMT Genes")
points(model_pvalues[which(model_pvalues$Gene_symbol == "SPP1"), 1:2],
pch = 20, col="red", cex=1.5)
legend(0, 1, legend=c("SPP1"), fill=c("red"))
With more stringent gene size I was able to see cleaner difference between GLI2 and the control cells. The clustered genes are the symmetric each other across the diagonals. The high expression showing genes on the GLI2’s side is lower on the EV’ side. Top hits are assessed on the stringent p-value of 0.0005. The difference between the first and the latest heatmap shows there is significant difference in terms of gene expression whose expression is represented for each respective Treatments
top_hits <- merged_hits$Gene_symbol[merged_hits$P.Value < 0.0005]
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix)
%in% top_hits),])))
if(min(heatmap_matrix_tophits) == 0){
heatmap_col = circlize::colorRamp2(c( 0, max(heatmap_matrix_tophits)),
c( "white", "purple"))
} else{
heatmap_col = circlize::colorRamp2(c(min(heatmap_matrix_tophits), 0,
max(heatmap_matrix_tophits)), c("darkgreen", "white", "purple"))
}
latest_heatmap <- ComplexHeatmap::Heatmap(as.matrix(heatmap_matrix_tophits),
show_row_dend = TRUE, show_column_dend = TRUE, col=heatmap_col,
show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE)
latest_heatmap
first_heatmap
The previously extracted GSE131222 data from gene omnibus expression (GEO) is extracted separately. The low count portions of the data filtered and a new matrix was created.
data = GEOquery::getGEOSuppFiles("GSE131222")
datanames = rownames(data)
GLI2_exp =read.delim(datanames[1],header=TRUE,check.names = FALSE)
count_per_ms = edgeR::cpm(GLI2_exp[,3:8])
rownames(count_per_ms) <- GLI2_exp[,2]
rows_filtered = rowSums(count_per_ms >1) >= 6
GLI2_exp_filtered = GLI2_exp[rows_filtered,]
filtered_GLI2_matrix <- as.matrix(GLI2_exp_filtered[,3:8])
rownames(filtered_GLI2_matrix) <- GLI2_exp_filtered$gene
head(GLI2_exp_filtered)[1:8]
The similar grouping was done previously. However this time dispersion values, normalization factors and differential expression model fitting is applied to the samples of GLI2 treatments.
grouped_treatments <- data.frame(lapply(colnames(GLI2_exp_filtered)[3:8],
FUN=function(x){unlist(strsplit(x, split = "[0-9]"))[c(1)]}))
colnames(grouped_treatments) <- colnames(GLI2_exp)[3:8]
rownames(grouped_treatments) <- c("cell_type")
grouped_treatments_normalized <- data.frame(t(grouped_treatments))
d = edgeR::DGEList(counts=filtered_GLI2_matrix, group = grouped_treatments_normalized$cell_type)
model_design_celltypes <- model.matrix(~grouped_treatments_normalized$cell_type+0)
d <- edgeR::estimateDisp(d, model_design_celltypes)
d <- edgeR::calcNormFactors(d)
fit <- edgeR::glmQLFit(d, model_design_celltypes)
qlf_test.GLI2vsEV <- edgeR::glmQLFTest(fit, coef = "grouped_treatments_normalized$cell_typeYRG")
With more stringent p-value I was able to assess the critical threshold for the expression threshold. I set the p-value as 0.0005. I decided to put it by adjusting and observing any difference. The difference between p-value 0.005 and p-value 0.0005 is one. This means this p-value is safe to assume as a threshold for significance among these differentially expressing genes.
qlf_top_hits <- edgeR::topTags(qlf_test.GLI2vsEV,sort.by = "PValue", n = nrow(filtered_GLI2_matrix))
head(qlf_top_hits)
Coefficient: grouped_treatments_normalized$cell_typeYRG
length(which(qlf_top_hits$table$PValue < 0.0005))
[1] 2434
length(which(qlf_top_hits$table$FDR < 0.0005))
[1] 2433
length(which(qlf_top_hits$table$logFC < 0))
[1] 11527
The threshold was calculated and based its value I assessed the how many genes actually show traits similar to GLI2. I used P value 1 to eliminate significant portion of the differentially expressed genes. I did not use the LogFC to distinguish between EMT GLI2 vs non-EMT GLI2, because all of logFC values are negative. This indicates there is always up regulation after the Dox treatment and GLI2 expression is always higher than the control. Pretty high confidence.
qlf_tophits_withgn <- merge(GLI2_exp[,1:1], qlf_top_hits, by.y=0,by.x=1,all.y=TRUE)
colnames(qlf_tophits_withgn) <- c("Gene_symbol", "logFC", "logCPM","F", "PValue", "FDR")
qlf_tophits_withgn[,"logarithmic"] <- (-log(qlf_tophits_withgn$PValue))
qlf_tophits_withgn <- qlf_tophits_withgn[order(qlf_tophits_withgn$Gene_symbol),]
GLI2_emt_Genes <- qlf_tophits_withgn$Gene_symbol[which(qlf_tophits_withgn$PValue < 0.0005
& qlf_tophits_withgn$logFC < 0)]
GLI2_non_emtgenes <- qlf_tophits_withgn$Gene_symbol[which(qlf_tophits_withgn$PValue == 1)]
write.table(x=GLI2_emt_Genes, file = file.path(getwd(),"data","GLI2_emt_Genes.txt"),
sep = "/", row.names = FALSE, col.names = FALSE, quote = FALSE)
write.table(x=GLI2_non_emtgenes, file = file.path(getwd(),"data","GLI2_non_emtgenes.txt"),
sep = "/", row.names = FALSE, col.names = FALSE, quote = FALSE)
The GProfiler analysis includes BH correction methods (FDR). I sticked using the same method because I used the same type of method on my previous analysis on FDR portion. For both upregualted and the downregulated genes I have used GO:BP (2022-12-04), Reactome, and WikiPathways( for annotation. These annotations allows for better asessment of my Upregualted genes in terms of defining the biologic pathways they are invovled in.
Upregulated_genes <- read.table(file=file.path(getwd(),"data" ,"GLI2_emt_Genes.txt"),header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
Upregulated_genes <- Upregulated_genes[4:nrow(Upregulated_genes),]
Upregulated_Gprofiler_validated <- gprofiler2::gost(query = Upregulated_genes,
organism = "hsapiens",
exclude_iea = TRUE,
sources = c("GO:BP", "REAC", "WP"),
correction_method = "fdr",
ordered_quer = FALSE,
)
Downregulated_genes <- read.table(file=file.path(getwd(),"data" ,"GLI2_non_emtgenes.txt"),header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
Downregulated_genes <- Downregulated_genes[13:nrow(Downregulated_genes),]
Downregulated_Gprofiler_validated <- gprofiler2::gost(query = Downregulated_genes,
organism = "hsapiens",
exclude_iea = TRUE,
correction_method = "fdr",
ordered_quer = FALSE,
source = c("GO:BP", "REAC", "WP"))
After subsetting I do have 771 genes that are down regulated. Down regualted genes are invovled in common biological pathways that are not involved in cancerous activities.
Downregulated_genes_filtered <- data.frame(
term_name = Downregulated_Gprofiler_validated$result$term_name[Downregulated_Gprofiler_validated$result$term_size < 200 & Downregulated_Gprofiler_validated$result$term_size > 1],
term_id = Downregulated_Gprofiler_validated$result$term_id[Downregulated_Gprofiler_validated$result$term_size < 200 &
Downregulated_Gprofiler_validated$result$term_size > 1],
source = Downregulated_Gprofiler_validated$result$source[Downregulated_Gprofiler_validated$result$term_size < 200 &
Downregulated_Gprofiler_validated$result$term_size > 1]
)
length(Downregulated_genes_filtered$term_name)
[1] 771
knitr::kable(Downregulated_genes_filtered$term_name[1:50], format = "html")
| x |
|---|
| tRNA metabolic process |
| cell cycle checkpoint signaling |
| DNA-templated transcription elongation |
| tRNA processing |
| regulation of DNA-templated transcription elongation |
| DNA-templated DNA replication |
| DNA integrity checkpoint signaling |
| tRNA modification |
| regulation of transcription elongation by RNA polymerase II |
| transcription elongation by RNA polymerase II |
| RNA modification |
| negative regulation of mitotic cell cycle |
| DNA damage checkpoint signaling |
| phosphatidylinositol biosynthetic process |
| signal transduction in response to DNA damage |
| mitotic cell cycle checkpoint signaling |
| protein acetylation |
| peptidyl-lysine acetylation |
| glycerophospholipid biosynthetic process |
| phosphatidylinositol metabolic process |
| negative regulation of mitotic cell cycle phase transition |
| histone acetylation |
| positive regulation of transcription elongation by RNA polymerase II |
| rRNA processing |
| internal peptidyl-lysine acetylation |
| microtubule organizing center organization |
| internal protein amino acid acetylation |
| autophagosome organization |
| mitotic DNA integrity checkpoint signaling |
| G1/S transition of mitotic cell cycle |
| nucleic acid phosphodiester bond hydrolysis |
| centrosome cycle |
| positive regulation of DNA-templated transcription, elongation |
| autophagosome assembly |
| RNA phosphodiester bond hydrolysis |
| tRNA methylation |
| protein deacetylation |
| protein deacylation |
| macromolecule deacylation |
| mitotic DNA damage checkpoint signaling |
| DNA-templated DNA replication maintenance of fidelity |
| histone deacetylation |
| RNA methylation |
| histone H3 acetylation |
| regulation of DNA repair |
| double-strand break repair via homologous recombination |
| post-Golgi vesicle-mediated transport |
| recombinational repair |
| vacuole organization |
| positive regulation of transcription initiation by RNA polymerase II |
After subsetting I do have 1128 genes that are down regulated. Up regualted genes are mostly invovled in cancer activity. Most of the pathwyas are quite invovled in cancerous activty. There is a lot apoptosis and programmed cell death activities.
Upregulated_genes_filtered <- data.frame(
term_name = Upregulated_Gprofiler_validated$result$term_name[Upregulated_Gprofiler_validated$result$term_size < 200 & Upregulated_Gprofiler_validated$result$term_size > 1],
term_id = Upregulated_Gprofiler_validated$result$term_id[Upregulated_Gprofiler_validated$result$term_size < 200 &
Upregulated_Gprofiler_validated$result$term_size > 1],
source = Upregulated_Gprofiler_validated$result$source[Upregulated_Gprofiler_validated$result$term_size < 200 &
Upregulated_Gprofiler_validated$result$term_size > 1]
)
length(Upregulated_genes_filtered$term_name)
[1] 1128
knitr::kable(Upregulated_Gprofiler_validated$result$term_name[1:50], format = "html")
| x |
|---|
| organonitrogen compound biosynthetic process |
| peptide metabolic process |
| translation |
| peptide biosynthetic process |
| organonitrogen compound metabolic process |
| amide metabolic process |
| amide biosynthetic process |
| cytoplasmic translation |
| cellular macromolecule metabolic process |
| cellular macromolecule biosynthetic process |
| protein metabolic process |
| metabolic process |
| cellular metabolic process |
| ATP synthesis coupled electron transport |
| mitochondrial ATP synthesis coupled electron transport |
| nitrogen compound metabolic process |
| respiratory electron transport chain |
| aerobic respiration |
| regulation of protein metabolic process |
| catabolic process |
| electron transport chain |
| aerobic electron transport chain |
| cellular respiration |
| oxidative phosphorylation |
| organic substance metabolic process |
| primary metabolic process |
| macromolecule catabolic process |
| protein localization |
| cellular macromolecule localization |
| macromolecule localization |
| organic substance catabolic process |
| protein catabolic process |
| establishment of protein localization |
| cellular catabolic process |
| programmed cell death |
| cell death |
| ribonucleoprotein complex biogenesis |
| cellular localization |
| organonitrogen compound catabolic process |
| apoptotic process |
| protein-containing complex organization |
| protein-containing complex assembly |
| intracellular transport |
| regulation of programmed cell death |
| generation of precursor metabolites and energy |
| energy derivation by oxidation of organic compounds |
| protein transport |
| cellular component biogenesis |
| ribose phosphate biosynthetic process |
| regulation of cell death |
NA
NA
gprofiler2::gostplot(Upregulated_Gprofiler_validated) %>% plotly::layout(title = "Upregulated genes ", font = list(size = 10))
NA
gprofiler2::gostplot(Downregulated_Gprofiler_validated) %>% plotly::layout(title = "Downregulated genes ", font = list(size = 10))
Complete_Gprofiler_Analyses <- gprofiler2::gost(query = merged_hits$Gene_symbol,
organism = "hsapiens",
exclude_iea = TRUE,
correction_method = "fdr",
sources = c("GO:BP", "REAC", "WP"))
Complete_Gprofiler_Analyses_filtered <- data.frame(term_name = Complete_Gprofiler_Analyses$result$term_name[Complete_Gprofiler_Analyses$result$term_size < 200 &
Complete_Gprofiler_Analyses$result$term_size > 1],
term_id = Complete_Gprofiler_Analyses$result$term_id[Complete_Gprofiler_Analyses$result$term_size < 200 &
Complete_Gprofiler_Analyses$result$term_size > 1],
source = Complete_Gprofiler_Analyses$result$source[Complete_Gprofiler_Analyses$result$term_size < 200 &
Complete_Gprofiler_Analyses$result$term_size > 1])
length(Complete_Gprofiler_Analyses_filtered$term_name)
[1] 1927
knitr::kable(Complete_Gprofiler_Analyses$result$term_name[1:100], format = "html")
| x |
|---|
| cellular metabolic process |
| primary metabolic process |
| metabolic process |
| cellular macromolecule metabolic process |
| nitrogen compound metabolic process |
| organic substance metabolic process |
| macromolecule metabolic process |
| cellular nitrogen compound metabolic process |
| organonitrogen compound metabolic process |
| protein metabolic process |
| heterocycle metabolic process |
| nucleobase-containing compound metabolic process |
| cellular biosynthetic process |
| cellular response to stress |
| organic substance biosynthetic process |
| cellular aromatic compound metabolic process |
| biosynthetic process |
| organic cyclic compound metabolic process |
| intracellular transport |
| nucleic acid metabolic process |
| macromolecule modification |
| cellular catabolic process |
| catabolic process |
| establishment of localization in cell |
| macromolecule catabolic process |
| macromolecule biosynthetic process |
| protein modification process |
| cellular macromolecule catabolic process |
| cellular nitrogen compound biosynthetic process |
| RNA processing |
| cellular localization |
| gene expression |
| organonitrogen compound biosynthetic process |
| cellular macromolecule biosynthetic process |
| organelle organization |
| organic substance catabolic process |
| RNA metabolic process |
| cellular process |
| cellular response to DNA damage stimulus |
| protein localization to organelle |
| regulation of cellular metabolic process |
| translation |
| cellular macromolecule localization |
| regulation of nitrogen compound metabolic process |
| protein localization |
| mitotic cell cycle |
| peptide biosynthetic process |
| amide biosynthetic process |
| regulation of primary metabolic process |
| macromolecule localization |
| protein catabolic process |
| ribonucleoprotein complex biogenesis |
| mitotic cell cycle process |
| proteolysis involved in protein catabolic process |
| peptide metabolic process |
| modification-dependent macromolecule catabolic process |
| cellular component organization or biogenesis |
| amide metabolic process |
| modification-dependent protein catabolic process |
| intracellular protein transport |
| organonitrogen compound catabolic process |
| cell cycle |
| ubiquitin-dependent protein catabolic process |
| DNA metabolic process |
| regulation of metabolic process |
| regulation of cell cycle |
| mRNA metabolic process |
| positive regulation of nitrogen compound metabolic process |
| regulation of protein metabolic process |
| establishment of protein localization |
| regulation of macromolecule metabolic process |
| cellular component biogenesis |
| positive regulation of cellular metabolic process |
| positive regulation of nucleobase-containing compound metabolic process |
| ribosome biogenesis |
| protein transport |
| DNA repair |
| cell cycle process |
| cellular component organization |
| regulation of catabolic process |
| positive regulation of metabolic process |
| proteasomal protein catabolic process |
| ncRNA metabolic process |
| proteasome-mediated ubiquitin-dependent protein catabolic process |
| positive regulation of macromolecule biosynthetic process |
| positive regulation of cellular biosynthetic process |
| positive regulation of macromolecule metabolic process |
| protein modification by small protein conjugation or removal |
| regulation of cellular catabolic process |
| positive regulation of biosynthetic process |
| heterocycle biosynthetic process |
| regulation of macromolecule biosynthetic process |
| RNA splicing |
| process utilizing autophagic mechanism |
| autophagy |
| protein modification by small protein conjugation |
| regulation of cellular biosynthetic process |
| regulation of organelle organization |
| organic cyclic compound biosynthetic process |
| establishment of protein localization to organelle |
gprofiler2::gostplot(Complete_Gprofiler_Analyses) %>% plotly::layout(title = "Complete plot plot", font = list(size = 10))
Yes the over-representation results we find is correlated to the conclusion the authors make. They conclude that genes having higher GLI2 expression, more differentially expressed, are more possible to be EMT genes. Additionally, we found there is always higher expression on GLI2 treatments given this is a strong oncogene acting in basal-sub-type switching triggering EMT switching. As it is obvious from the visualization the upregualted genes are more clustered and are much more in the downregualted genes This supports the hypothesis of authors, concluding the EMT genes to be higher expression than that of the downregualted genes.
The paper (Pasca di Magliano et al.,2006) has already indicated increase expression of GLI2 can lead to SHH gene reduction. This gene has been known to be an oncogene associated with basal-sub-type switching in PDAC. They activate GLI2 constantly which has the same expected outcome in our data set having GLI2 expressing vectors with constant Dox treatment. In conclusion these two paper findings correlate each other. They show that the EMT genes to be abundant in their biological functions and their pathway invovlement.